8 research outputs found

    Self-Updating Models with Error Remediation

    Full text link
    Many environments currently employ machine learning models for data processing and analytics that were built using a limited number of training data points. Once deployed, the models are exposed to significant amounts of previously-unseen data, not all of which is representative of the original, limited training data. However, updating these deployed models can be difficult due to logistical, bandwidth, time, hardware, and/or data sensitivity constraints. We propose a framework, Self-Updating Models with Error Remediation (SUMER), in which a deployed model updates itself as new data becomes available. SUMER uses techniques from semi-supervised learning and noise remediation to iteratively retrain a deployed model using intelligently-chosen predictions from the model as the labels for new training iterations. A key component of SUMER is the notion of error remediation as self-labeled data can be susceptible to the propagation of errors. We investigate the use of SUMER across various data sets and iterations. We find that self-updating models (SUMs) generally perform better than models that do not attempt to self-update when presented with additional previously-unseen data. This performance gap is accentuated in cases where there is only limited amounts of initial training data. We also find that the performance of SUMER is generally better than the performance of SUMs, demonstrating a benefit in applying error remediation. Consequently, SUMER can autonomously enhance the operational capabilities of existing data processing systems by intelligently updating models in dynamic environments.Comment: 17 pages, 13 figures, published in the proceedings of the Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications II conference in the SPIE Defense + Commercial Sensing, 2020 symposiu

    Dynamic Analysis of Executables to Detect and Characterize Malware

    Full text link
    It is needed to ensure the integrity of systems that process sensitive information and control many aspects of everyday life. We examine the use of machine learning algorithms to detect malware using the system calls generated by executables-alleviating attempts at obfuscation as the behavior is monitored rather than the bytes of an executable. We examine several machine learning techniques for detecting malware including random forests, deep learning techniques, and liquid state machines. The experiments examine the effects of concept drift on each algorithm to understand how well the algorithms generalize to novel malware samples by testing them on data that was collected after the training data. The results suggest that each of the examined machine learning algorithms is a viable solution to detect malware-achieving between 90% and 95% class-averaged accuracy (CAA). In real-world scenarios, the performance evaluation on an operational network may not match the performance achieved in training. Namely, the CAA may be about the same, but the values for precision and recall over the malware can change significantly. We structure experiments to highlight these caveats and offer insights into expected performance in operational environments. In addition, we use the induced models to gain a better understanding about what differentiates the malware samples from the goodware, which can further be used as a forensics tool to understand what the malware (or goodware) was doing to provide directions for investigation and remediation.Comment: 9 pages, 6 Tables, 4 Figure

    Tracking Cyber Adversaries with Adaptive Indicators of Compromise

    Full text link
    A forensics investigation after a breach often uncovers network and host indicators of compromise (IOCs) that can be deployed to sensors to allow early detection of the adversary in the future. Over time, the adversary will change tactics, techniques, and procedures (TTPs), which will also change the data generated. If the IOCs are not kept up-to-date with the adversary's new TTPs, the adversary will no longer be detected once all of the IOCs become invalid. Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular expressions (regexes), up-to-date with a dynamic adversary. Our framework solves the TTK problem in an automated, cyclic fashion to bracket a previously discovered adversary. This tracking is accomplished through a data-driven approach of self-adapting a given model based on its own detection capabilities. In our initial experiments, we found that the true positive rate (TPR) of the adaptive solution degrades much less significantly over time than the naive solution, suggesting that self-updating the model allows the continued detection of positives (i.e., adversaries). The cost for this performance is in the false positive rate (FPR), which increases over time for the adaptive solution, but remains constant for the naive solution. However, the difference in overall detection performance, as measured by the area under the curve (AUC), between the two methods is negligible. This result suggests that self-updating the model over time should be done in practice to continue to detect known, evolving adversaries.Comment: This was presented at the 4th Annual Conf. on Computational Science & Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas, Nevada, US

    Plant Trait Diversity Buffers Variability in Denitrification Potential over Changes in Season and Soil Conditions

    Get PDF
    BACKGROUND: Denitrification is an important ecosystem service that removes nitrogen (N) from N-polluted watersheds, buffering soil, stream, and river water quality from excess N by returning N to the atmosphere before it reaches lakes or oceans and leads to eutrophication. The denitrification enzyme activity (DEA) assay is widely used for measuring denitrification potential. Because DEA is a function of enzyme levels in soils, most ecologists studying denitrification have assumed that DEA is less sensitive to ambient levels of nitrate (NO(3)(-)) and soil carbon and thus, less variable over time than field measurements. In addition, plant diversity has been shown to have strong effects on microbial communities and belowground processes and could potentially alter the functional capacity of denitrifiers. Here, we examined three questions: (1) Does DEA vary through the growing season? (2) If so, can we predict DEA variability with environmental variables? (3) Does plant functional diversity affect DEA variability? METHODOLOGY/PRINCIPAL FINDINGS: The study site is a restored wetland in North Carolina, US with native wetland herbs planted in monocultures or mixes of four or eight species. We found that denitrification potentials for soils collected in July 2006 were significantly greater than for soils collected in May and late August 2006 (p<0.0001). Similarly, microbial biomass standardized DEA rates were significantly greater in July than May and August (p<0.0001). Of the soil variables measured--soil moisture, organic matter, total inorganic nitrogen, and microbial biomass--none consistently explained the pattern observed in DEA through time. There was no significant relationship between DEA and plant species richness or functional diversity. However, the seasonal variance in microbial biomass standardized DEA rates was significantly inversely related to plant species functional diversity (p<0.01). CONCLUSIONS/SIGNIFICANCE: These findings suggest that higher plant functional diversity may support a more constant level of DEA through time, buffering the ecosystem from changes in season and soil conditions

    Harnessing the NEON data revolution to advance open environmental science with a diverse and data-capable community

    Get PDF
    It is a critical time to reflect on the National Ecological Observatory Network (NEON) science to date as well as envision what research can be done right now with NEON (and other) data and what training is needed to enable a diverse user community. NEON became fully operational in May 2019 and has pivoted from planning and construction to operation and maintenance. In this overview, the history of and foundational thinking around NEON are discussed. A framework of open science is described with a discussion of how NEON can be situated as part of a larger data constellation—across existing networks and different suites of ecological measurements and sensors. Next, a synthesis of early NEON science, based on >100 existing publications, funded proposal efforts, and emergent science at the very first NEON Science Summit (hosted by Earth Lab at the University of Colorado Boulder in October 2019) is provided. Key questions that the ecology community will address with NEON data in the next 10 yr are outlined, from understanding drivers of biodiversity across spatial and temporal scales to defining complex feedback mechanisms in human–environmental systems. Last, the essential elements needed to engage and support a diverse and inclusive NEON user community are highlighted: training resources and tools that are openly available, funding for broad community engagement initiatives, and a mechanism to share and advertise those opportunities. NEON users require both the skills to work with NEON data and the ecological or environmental science domain knowledge to understand and interpret them. This paper synthesizes early directions in the community’s use of NEON data, and opportunities for the next 10 yr of NEON operations in emergent science themes, open science best practices, education and training, and community building
    corecore